13 research outputs found

    Novel gradient-based methods for data distribution and privacy in data science

    Get PDF
    With an increase in the need of storing data at different locations, designing algorithms that can analyze distributed data is becoming more important. In this thesis, we present several gradient-based algorithms, which are customized for data distribution and privacy. First, we propose a provably convergent, second order incremental and inherently parallel algorithm. The proposed algorithm works with distributed data. By using a local quadratic approximation, we achieve to speed-up the convergence with the help of curvature information. We also illustrate that the parallel implementation of our algorithm performs better than a parallel stochastic gradient descent method to solve a large-scale data science problem. This first algorithm solves the problem of using data that resides at different locations. However, this setting is not necessarily enough for data privacy. To guarantee the privacy of the data, we propose differentially private optimization algorithms in the second part of the thesis. The first one among them employs a smoothing approach which is based on using the weighted averages of the history of gradients. This approach helps to decrease the variance of the noise. This reduction in the variance is important for iterative optimization algorithms, since increasing the amount of noise in the algorithm can harm the performance. We also present differentially private version of a recent multistage accelerated algorithm. These extensions use noise related parameter selection and the proposed stepsizes are proportional to the variance of the noisy gradient. The numerical experiments show that our algorithms show a better performance than some well-known differentially private algorithm

    Differentially Private Accelerated Optimization Algorithms

    Full text link
    We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decrease the accumulated noise on the gradient steps required for differential privacy. The second class of algorithms are based on Nesterov's accelerated gradient method and its recent multi-stage variant. We propose a noise dividing mechanism for the iterations of Nesterov's method in order to improve the error behavior of the algorithm. The convergence rate analyses are provided for both the heavy ball and the Nesterov's accelerated gradient method with the help of the dynamical system analysis techniques. Finally, we conclude with our numerical experiments showing that the presented algorithms have advantages over the well-known differentially private algorithms.Comment: 28 pages, 4 figure

    Phylostat: a web-based tool to analyze paralogous clade divergence in phylogenetic trees

    No full text
    Phylogenetic trees are useful tools to infer evolutionary relationships between genetic entities. Phylogenetics enables not only evolution-based gene clustering but also the assignment of gene duplication and deletion events to the nodes when coupled with statistical approaches such as bootstrapping. However, extensive gene duplication and deletion events bring along a challenge in interpreting phylogenetic trees and require manual inference. In particular, there has been no robust method of determining whether one of the paralog clades systematically shows higher divergence following the gene duplication event as a sign of functional divergence. Here, we provide Phylostat, a graphical user interface that enables clade divergence analysis, visually and statistically. Phylostat is a web-based tool built on phylo.io to allow comparative clade divergence analysis, which is available at https://phylostat.adebalilab.org under an MIT open-source licence

    Differentially private accelerated optimization algorithms

    No full text
    We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decrease the accumulated noise on the gradient steps required for differential privacy. The second class of algorithms are based on Nesterov's accelerated gradient method and its recent multistage variant. We propose a noise dividing mechanism for the iterations of Nesterov's method in order to improve the error behavior of the algorithm. The convergence rate analyses are provided for both the heavy ball and the Nesterov's accelerated gradient method with the help of the dynamical system analysis techniques. Finally, we conclude with our numerical experiments showing that the presented algorithms have advantages over the well-known differentially private algorithms

    PHACT: phylogeny-aware computing of tolerance for missense mutations

    No full text
    Evolutionary conservation is a fundamental resource for predicting the substitutability of amino acids and the loss of function in proteins. The use of multiple sequence alignment alone-without considering the evolutionary relationships among sequences-results in the redundant counting of evolutionarily related alteration events, as if they were independent. Here, we propose a new method, PHACT, that predicts the pathogenicity of missense mutations directly from the phylogenetic tree of proteins. PHACT travels through the nodes of the phylogenetic tree and evaluates the deleteriousness of a substitution based on the probability differences of ancestral amino acids between neighboring nodes in the tree. Moreover, PHACT assigns weights to each node in the tree based on their distance to the query organism. For each potential amino acid substitution, the algorithm generates a score that is used to calculate the effect of substitution on protein function. To analyze the predictive performance of PHACT, we performed various experiments over the subsets of two datasets that include 3,023 proteins and 61,662 variants in total. The experiments demonstrated that our method outperformed the widely used pathogenicity prediction tools (i.e., SIFT and PolyPhen-2) and achieved a better predictive performance than other conventional statistical approaches presented in dbNSFP. The PHACT source code is available at https://github.com/CompGenomeLab/PHACT

    Synergy and selectivity of antifungal small molecule combinations

    No full text
    While synergistic small molecule combinations are often sought for increased efficacy, the selectivity of antimicrobial compound combinations is also paramount to drug therapy. Using the yeast model organisms S. cerevisiae and C. albicans, we conducted in vitro checkerboard assays for drug interactions of all pairwise combinations of 12 antifungals. We assessed the concavity of isobolograms for growth inhibition as a metric for drug interactions. We found that drug interactions were significantly conserved between these species despite varied concentration-response relationships of single drugs. We developed a metric, the fractional selectivity index, which indicates the relative growth of two cell types in any given ratio of a drug combination. Based on this analysis, we found drug regimens with increased selectivity for growth inhibition of one yeast species relative to another. Our analysis suggests a framework for discovering fixed-dose drug combinations with increased efficacy and selectivity for specific pathogens

    A framework for parallel second order incremental optimization algorithms for solving partially separable problems

    No full text
    We propose Hessian Approximated Multiple Subsets Iteration (HAMSI), which is a generic second order incremental algorithm for solving large-scale partially separable convex and nonconvex optimization problems. The algorithm is based on a local quadratic approximation, and hence, allows incorporating curvature information to speed-up the convergence. HAMSI is inherently parallel and it scales nicely with the number of processors. We prove the convergence properties of our algorithm when the subset selection step is deterministic. Combined with techniques for effectively utilizing modern parallel computer architectures, we illustrate that a particular implementation of the proposed method based on L-BFGS updates converges more rapidly than a parallel gradient descent when both methods are used to solve large-scale matrix factorization problems. This performance gain comes only at the expense of using memory that scales linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many large scale problems, where first order methods based on variants of gradient descent are applicable

    Modeling the impact of drug interactions on therapeutic selectivity

    Get PDF
    While drugs can interact in both target and off-target cell types, more favorable interaction in the target cell may nevertheless allow for a therapeutic window. Here, the authors show, using two yeast species as a model, that differential drug interactions indeed adjust the selective window
    corecore